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Power  Allocation  for  a  MIMO  Relay  System 
with  Multiple- Antenna  Users 

Yuan  Yu  and  Yingbo  Hua,  Fellow,  IEEE 


Abstract 

A  power  allocation  or  scheduling  problem  ’  is  studied  for  a  multiuser  MIMO  wireless  relay  system 
where  there  is  a  non-regenerative  relay  between  one  access  point  and  multiple  users.  Each  node  in  the 
system  is  equipped  with  multiple  antennas.  The  purpose  of  this  study  is  to  develop  fast  algorithms  to 
compute  the  source  covariance  matrix  (or  matrices)  and  the  relay  transformation  matrix  to  optimize 
a  system  performance.  We  consider  the  minimization  of  power  consumption  subject  to  rate  constraint 
and  also  the  maximization  of  system  throughput  subject  to  power  constraint.  These  problems  are  non- 
convex  and  apparently  have  no  simple  solutions.  In  this  paper,  a  number  of  computational  strategies 
are  presented  and  their  performances  are  investigated.  Both  uplink  and  downlink  cases  are  considered. 
The  use  of  multiple  carriers  is  also  discussed.  Moreover,  a  generalized  water-filling  (GWF)  algorithm  is 
developed  to  solve  a  special  class  of  convex  optimization  problems.  The  GWF  algorithm  is  used  for  two 
of  the  strategies  shown  in  this  paper. 


Index  Terms 

Network  of  MIMO  links,  medium  access  control,  space-time  power  allocation,  space-time  power 
scheduling,  multiuser  MIMO  relays,  convex  optimization,  non-convex  optimization,  generalized  water 
filling. 
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former  stresses  the  computation  of  the  latter  in  advance. 
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I.  Introduction 

Wireless  relays  are  known  to  be  useful  to  increase  the  coverage  of  wireless  communications  under 
power  and  spectral  constraints.  A  wireless  relay  can  be  regenerative  or  non-regenerative.  A  regenerative 
relay  requires  digital  decoding  and  re-encoding  at  the  relay,  which  can  cause  a  significant  increase  of 
delay  and  complexity.  A  non-regenerative  relay  does  not  need  any  digital  decoding  and  re-encoding  at 
the  relay,  which  is  a  useful  advantage  over  regenerative  relays. 

Recently,  there  have  been  many  research  efforts  on  non-regenerative  MIMO  relay  systems  [1],  [2], 
[3],  [4],  [5],  [6],  [7],  [8],  [9].  A  non-regenerative  MIMO  relay  applies  a  transformation  matrix,  also 
called  relay  matrix,  to  its  received  signal  vector  and  then  forwards  it  to  the  next  node.  The  MIMO  relay 
formulation  in  [3]  includes  the  multicarrier  relay  problem  in  [10]  as  a  special  case.  This  paper  continues 
to  address  non-regenerative  MIMO  relay  systems.  In  particular,  we  consider  power  allocation  problems. 
In  the  context  of  MIMO  relays,  a  power  allocation  problem  is  about  the  determination  of  the  source 
covariance  matrix  and  the  relay  matrix  to  maximize  a  system  performance. 

For  a  single-user  two-hop  MIMO  relay  system,  an  optimal  structure  of  the  relay  matrix  that  maximizes 
the  source-to-destination  mutual  information  was  presented  in  [1]  and  [2],  and  an  optimal  structure  for 
both  the  source  covariance  matrix  and  the  relay  matrix  was  established  in  [3].  The  optimality  of  this 
structure,  which  is  essentially  a  diagonalization  or  decoupling  of  the  entire  relay  system  into  a  set  of 
parallel  scalar  sub-systems,  is  recently  established  in  [5]  for  a  broader  class  of  objective  functions  known 
as  Schur-convex  or  Schur-concave  functions.  Furthermore,  this  elegant  structure  is  also  shown  in  [6]  to 
be  optimal  for  a  multi-hop  MIMO  relay  system  of  any  number  of  hops. 

For  multiuser  MIMO  relay  systems,  however,  the  above  mentioned  property  does  not  hold  any  more. 
Finding  the  source  covariance  matrix  and  the  relay  matrix  to  maximize  a  system  performance  is  generally 
a  difficult  task.  Prior  efforts  on  multiuser  MIMO  relay  systems  are  reported  in  [7],  [8]  and  [9].  In  these 
works,  each  user  is  assumed  to  have  a  single  antenna.  Part  of  the  reason  for  this  assumption  was  to 
simplify  the  problem.  Additional  references  on  MIMO  relays  can  be  found  in  [11]. 

In  this  paper,  we  focus  on  a  multiuser  two-hop  MIMO  relay  system  where  each  node  is  equipped 
with  multiple  antennas.  For  this  problem,  not  only  the  diagonal  structure  as  shown  in  [1],  [2],  [3]  and 
[5]  is  no  longer  optimal,  but  also  the  uplink-downlink  duality  property  shown  in  [12]  and  [9]  no  longer 
applies.  This  makes  the  optimal  power  allocation  a  difficult  task.  Facing  the  challenge  unsolved  by  others, 
we  will  present  a  number  of  computational  strategies  to  search  for  the  best  possible  power  allocation. 
We  will  consider  both  uplink  and  downlink  problems.  We  will  also  consider  both  system  throughput 
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maximization  and  power  consumption  minimization.  These  algorithms  are  summarized  in  Table  I  and 
discussed  in  detail  in  this  paper.  These  algorithms  are  designed  to  solve  the  power  allocation  problems 
more  general  than  those  treated  before.  In  particular,  for  a  problem  treated  in  [7],  our  approach  can  yield 
much  better  results  than  the  approach  developed  there. 

We  assume  that  all  channel  matrices  are  known  to  a  central  scheduler  and  to  the  transmitters  and 
receivers  if  needed.  Except  for  Algorithm  1,  all  other  algorithms  in  Table  I  are  not  mathematically  proven 
to  yield  globally  optimal  results  for  their  corresponding  problems.  However,  Algorithm  1  is  based  on  a 
reformulation  of  the  original  problem,  which  essentially  approximates  the  original  non-convex  problem 
by  a  convex  problem.  Because  of  this  approximation,  there  is  a  significant  penalty  to  the  performance  of 
Algorithm  1  as  shown  later  in  Section  VI. 

We  will  also  develop  a  generalized  water  filling  (GWF)  fheorem  and  fhe  corresponding  GWF  algorifhm 
fo  solve  wifh  global  opfimalify  a  special  fype  of  convex  opfimizafion  problems.  The  GWF  algorifhm  is  a 
useful  building  block  for  fwo  of  fhe  power  allocafion  algorifhms  summarized  in  Table  I.  In  fhe  liferafure 
fhere  are  ofher  types  of  algorifhms  also  called  generalized  wafer  filling.  Buf  fhey  were  acfually  designed 
for  differenl  problems.  Our  GWF  algorifhm  is  a  generalization  of  fhe  conventional  wafer  filling  algorifhm 
from  single  power  consfrainf  fo  mulfiple  power  consfrainfs. 

In  Section  II,  fhe  GWF  fheorem  is  presenfed.  In  Section  III,  we  freaf  a  mulfiuser  MIMO  relay  downlink 
sysfem.  We  presenf  power  allocafion  algorifhms  for  maximizing  fhe  sysfem  fhroughpuf  (i.e.,  sum  rafe) 
under  a  power  consfrainf,  and  power  allocafion  algorifhms  for  minimizing  fhe  sysfem  power  consumpfion 
under  individual  user  rale  consfrainfs.  In  Secfion  IV,  we  deal  wifh  similar  issues  for  fhe  uplink  case.  In 
Secfion  V,  we  show  how  fo  apply  our  algorifhms  for  joinl  multicarrier  power  allocafion.  In  Secfion  VI, 
simulalions  resulls  are  presenfed  fo  illuslrale  fhe  performances  of  our  algorifhms.  This  sludy  confirms  lhal 
power  allocafion  affecls  fhe  sysfem  performance  significanlly  and  developing  fasl  algorifhms  for  power 
allocafion  is  crilically  imporlanf. 

II.  A  Generalized  Water-Filling  Algorithm 

Consider  the  following  convex  optimization  problem: 

min  J  = -logll  +  HQH^I  (1) 

Q>0  'VI 

s.t.  tr{BiQ,'B^}  <  Pi,  VfG{l...m} 

where  H  and  Bj  are  complex  matrices,  Q  is  a  complex  positive  semi-definite  matrix,  and  Pi  are  positive 
numbers.  Without  its  base  specified,  log  has  fhe  nafural  base  e.  If  m  =  1,  fhe  solufion  fo  fhe  above 


January  23,  2010 


DRAFT 


4 


problem  can  be  found  by  a  well  known  water-filling  algorithm.  It  is  a  fast  algorithm  for  this  particular 
case.  If  m  >  1,  however,  there  appears  no  fast  algorithm  available  in  the  prior  literature  except  for  the 
general  purpose  convex  optimization  programs  such  as  the  CVX  package  designed  for  Matlab  [13].  We 
now  introduce  a  special  purpose  algorithm,  referred  to  as  generalized  water-filling  (GWF)  algorithm,  to 
solve  the  problem  in  (1).  The  GWF  algorithm  is  based  on  the  following  GWF  theorem: 

Theorem  1:  The  solution  to  (1)  is  given  by: 

Q  =  K"'^V(I  -  (2) 

where  K  =  ^  (assumed  to  be  non-singular),  V  and  X)  are  determined  from  the  SVD 

HK~^  =  UX)V^,  (•)+  replaces  all  negative  diagonal  elements  by  zeros  and  leaves  all  non-negative 
diagonal  elements  unchanged,  and  /x  =  (/xi,  •  •  •  ,  Hm)  are  the  solution  to  the  following  dual  problem: 

m 

max  -  log  |I  -h  HQH^I  -f  /Xi(fr(BiQBf )  -  Pi)  (3) 

/x>0  ^ ^ 

i=l 

s.t.  Q  =  K-^V(I  - 

To  our  knowledge,  this  theorem  is  new.  The  proof  of  this  theorem  and  an  algorithm  for  computing 
fi  are  given  in  Appendices  A  and  B,  respectively.  A  complete  Matlab  script  of  the  GWF  algorithm  is 
available  at  http://www.ee.ucr.edu/-yhua/GWF.pdf.  As  illustrated  by  a  simulation  example  in  Appendix 
C,  the  GWF  algorithm  can  achieve  the  same  accuracy  as  CVX,  and  the  former  has  a  much  faster  speed 
than  the  latter  when  the  dimension  of  fi  is  much  smaller  than  that  of  Q.  The  GWF  algorithm  is  useful  for 
more  applications  than  those  shown  in  this  paper.  For  example,  if  one  wants  to  design  a  source  covariance 
matrix  to  maximize  the  data  rate  of  a  MIMO  link  and  also  wants  to  keep  the  interference  from  this  source 
to  other  neighboring  nodes  under  certain  limits,  such  a  problem  can  be  directly  formulated  as  (1). 

III.  Multiuser  MIMO  Downlink  Relay 

We  first  consider  the  multiuser  MIMO  downlink  relay  system  as  illustrated  in  Fig.  1,  where  x  G 
denotes  the  signal  transmitted  from  the  source  equipped  with  M  antennas,  F  G  transformation 

matrix  performed  by  the  non-regenerative  relay  also  equipped  with  M  antennas,  and  y*  G  the  signal 
received  by  the  user  i  equipped  with  N  antennas.  Furthermore,  H  G  denotes  the  channel  matrix 

between  the  source  and  the  relay,  Hj  G  is  the  channel  matrix  between  the  relay  and  the  user  i, 

and  n,  ni,  •  •  • ,  are  the  zero-mean  Gaussian  noises  at  the  relay  and  the  K  users.  Here,  we  assume 
that  all  the  users  are  equipped  with  the  same  number  of  antennas.  The  transmission  from  the  source  to 
the  relay  is  assumed  to  be  orthogonal  (in  time  and/or  frequency)  to  the  transmission  from  the  relay  to 


January  23,  2010 


DRAFT 


5 


all  users.  We  also  assume  that  the  direct  link  between  the  source  and  any  of  the  users  is  very  weak  and 
negligible . 

Note  that  if  the  actual  numbers  of  antennas  at  the  users,  relay  or  source  are  different  from  what  is 
described  above,  we  can  always  add  imaginary  dummy  antennas  to  make  up  the  number  M  or  N.  The 
effective  H  G  (jAfxM  jj.  ^  qNxM  ^ero  rows  or  zero  columns,  which  however  do  not  affect 

the  expressions  of  our  results. 

The  signal  y  received  at  the  relay,  the  signal  r  transmitted  from  the  relay,  and  the  signal  y*  received 
by  the  user  i  can  be  expressed  as  follows: 

y  =  Hx  +  n  (4) 

r  =  Fy  =  FHx  +  Fn  (5) 

yj  =  Hjr  +  rij  =  HjFHx  +  HjFn  +  rij  (6) 

If  n  has  a  covariance  matrix  C^,  we  can  write  Cn^'^^y  =  where  the  noise  term 

Cn  '  n  has  the  covariance  matrix  equal  to  the  identity  matrix.  So,  provided  that  the  noise  covariance 
matrices  of  n  and  rij  are  known,  we  can  assume  for  convenience  that  they  are  the  identity  matrices.  We 
now  define  =  [Hf^, . . . ,  H^],  =  [yf^, . . . ,  y^]  and  =  [nf^, . . . ,  n^].  Then,  using  (6)  for  all 

i,  we  have 

yc  =  HcFHx  +  HcFn  +  ric  (7) 

This  is  an  effective  channel  model  between  the  source  and  all  users. 

A.  Maximization  of  Sum  Rate  under  Power  Constraint  and  ZFDPC  (Algorithms  1-2) 

The  problem  of  maximizing  the  sum  rate  for  all  users  under  a  power  constraint  for  the  downlink  case 
was  considered  in  [7]  where  each  user  has  a  single  antenna.  The  authors  also  assume  the  use  of  zero 
forcing  dirty  paper  coding  (ZFDPC)  [14].  We  now  extend  the  approach  in  [7]  to  users  with  multiple 
antennas. 

Define  the  QR  decomposition  of  the  KN  x  M  matrix  He  as  He  =  RQ,  where  Q  is  an  M  x  M  unitary 
matrix  (which  is  not  the  same  Q  in  section  II)  and  R  is  a  x  M  lower  triangular  matrix.  Define  the 
SVD  of  the  channel  matrix  H  as  H  =  where  S/,  =  =  diag{Xh,i,  Xh,2,  ■  ■  ■ , 

with  descending  diagonal  elements,  and  U/i  and  Yh  are  unitary. 

We  assume  that  the  source  precoder  generates  x  =  where  s  contains  i.i.d.  symbols  of  unit 
variance  and  A^;  is  such  that  the  source  covariance  matrix  is  H^;  =  F{xx^}  =  AxA^  =  YhA^Y^ 
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with  A.X  =  diag{Xx,i,  Xx,2,  •  •  •  j  Xx,m)-  We  also  assume  that  the  relay  matrix  is  constructed  as 

F  =  Q^5];Uf,  I]^  =  Ay'  =  dza5(Ay,i,A/,2,...,Ay,^)i/2  (8) 

Here,  the  source  covariance  matrix  is  matched  to  the  right  singular  vectors  of  the  channel  matrix  H,  the 
optimality  of  which  for  a  single  user  relay  system  is  shown  in  [3].  The  relay  matrix  here  is  matched  to 
the  left  singular  vectors  of  H  and  the  unitary  matrix  Q  of  He,  which  is  adopted  only  heuristically  without 
proof  of  optimality.  As  mentioned  in  [7],  the  matrix  Q  is  also  affected  hy  column  permutations  of  He, 
which  can  he  further  optimized.  With  the  above  structures  of  the  precoder  A^,  and  the  relay  matrix  F, 
(7)  becomes 


Ye  =  RH/UftS  +  (RH/ii  +  rie) 


(9) 


where  n  =  U^n.  Note  that  each  element  of  s  represents  a  scalar  stream  of  data.  Since  R  is  lower 
triangular,  it  is  clear  from  the  first  term  of  (9)  that  the  interference  from  stream  j  to  stream  i  for  j  >  i 
is  now  absent.  To  remove  the  interference  from  stream  j  to  stream  i  for  j  <  i,  we  can  use  the  dirty 
paper  coding  (DPC)  starting  from  the  first  stream  that  corresponds  to  the  first  element  of  s  in  (9).  For 
the  first  stream,  there  is  no  interference  from  other  streams  and  the  conventional  coding  is  applied.  For 
the  second  stream,  there  is  the  interference  from  the  first  stream  which  is  however  known  to  the  encoder. 
With  DPC,  the  interference  from  the  first  stream  to  the  second  stream  can  be  virtually  eliminated.  The 
same  principle  applies  to  the  remaining  streams.  Then,  with  DPC,  the  effective  signal  to  noise  ratio  for 
the  zth  data  stream  is 


SNR,  = 


|Ri,i|  Xj^iXh^iXx^i 

Si=l  1 


(10) 


where  Rij  is  the  (i,j)th  element  of  R.  Note  that  the  use  of  DPC  has  removed  the  mutual  interference 
between  the  elements  of  s.  But  the  first  term  (the  sum)  in  the  denominator  of  (10)  is  due  to  the  noise 
forwarded  from  the  relay.  The  above  interference  cancellation  method  based  on  the  QR  decomposition 
and  the  DPC  is  known  as  zero  forcing  dirty  paper  coding  (ZFDPC)  [14]. 

The  problem  of  maximizing  the  sum  rate  of  this  downlink  relay  system  under  ZFDPC  can  now  be 
formulated  as 


KN 


max 

A,,  Ax 

^'sum,d  —  log2(l  +  SNRi) 

(11) 

S.t. 

tr{Ax}  <  Px 

(12) 

tr{Af{AhAx  +  I)}  <  Pf 

(13) 
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where  the  power  constraint  (12)  is  for  the  source,  and  the  power  constraint  (13)  is  for  the  relay.  In  [7], 
the  problem  (11)  is  solved  hy  a  geometric  programming  under  a  high  SNR  approximation,  which  will 
he  referred  to  as  Algorithm  1.  Note  that  a  weighted  sum  rate  can  he  used  for  all  sum  rate  maximization 
algorithms.  But  for  convenience,  we  choose  the  unit  weights. 

Next,  we  present  an  algorithm  without  the  high-SNR  assumption,  referred  to  as  Algorithm  2.  We  will 
search  for  Ay  and  in  an  alternate  fashion,  where  each  cycle  of  the  alternation  is  as  follows. 

1)  Source  optimization  with  fixed  Af:  It  is  easy  to  verify  that  with  any  fixed  Ay,  the  problem  (11) 
is  a  special  case  of  the  problem  (1)  shown  in  Section  II,  and  hence  the  optimal  A^,  can  be  found  by  the 
GWF  algorithm. 

2)  Relay  optimization  with  fixed  A^:  With  any  fixed  A^;,  the  optimal  Ay  can  be  found  by  maximizing 
the  following  penalized  function  of  (11): 


KN 


■^1  (^/)  =  ^  1  + 


X]7=l 


1 


log  (Pf-Y.  3“  1) 


(14) 


where  the  second  term  is  the  logarithmic  barrier  function  [15]  associated  with  the  constraint  (13).  For 
convenience,  we  will  also  write  Li  (Ay)  =  Li  (Ay)  where  Ay  =  diag{Xf).  The  gradient  of  Li  (Ay) 
with  respect  to  Ay,  denoted  by  VLi(Ay),  is  easy  to  derive,  which  is  omitted.  Following  the  Armijo’s 
rule  [16],  the  search  algorithm  for  Ay  is  as  follows: 

=  Xf^  +  f3^VLi{xf^) 

where  m  is  the  smallest  integer  satisfying 

-  Li  ||vLi(a5.^^) 


(15) 


Li 


P/- J^Aj+HA/^,gA,y  +  l)  >  0 


(16) 

(17) 


and  0  <  cj  <  1  and  0  <  /3  <  1.  After  convergence  of  the  above  search  for  a  fixed  t,  a  new  search  is 
started  with  an  increased  t.  When  1/t  becomes  small  enough,  the  search  for  Ay  is  considered  completed 
for  the  given  A^;. 


B.  Maximization  of  Sum  Rate  under  Power  Constraint  and  DPC  (Algorithm  3) 

ZFDPC  is  a  scalar  DPC,  which  is  suboptimal  compared  to  the  vector  DPC  [12],  [14],  [17].  From  now 
on,  the  vector  DPC  will  be  referred  to  as  DPC.  Civen  that  the  K  users  receive  independent  messages 
from  the  source,  we  can  write  the  transmitted  vector  from  the  source  as  x  =  xi  +  •  •  •  +  x/^-  and  its 
(source)  covariance  matrix  as  =  Hi  +  •  •  •  +  YIk  where  11  j  is  the  covariance  matrix  of  the  signal 
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Xj  meant  for  user  i.  Assuming  the  use  of  DPC  in  the  descending  order  starting  from  user  K,  i.e.,  the 
interference  from  user  j  to  user  i  for  j  >  i  is  virtually  absent,  the  achievable  data  rate  for  user  i  in 
bits/s/Hz  is  given  by 


Id,i  =  log2 


H,FH 

(Ej-.n.) 

H^F^Hf  +  HiFF^Hf  +  I 

H,FH 

H^F^Hf  +  HiFF^Hf  +  I 

(18) 


With  any  given  set  of  the  source  covariance  matrices  Ilj  for  i  =  a  complete  design  of  the 

vector  DPC  to  achieve  the  rates  in  (18)  can  be  made  by  following  [17]. 

In  the  absence  of  total  power  constraint,  the  maximum  possible  data  rate  for  user  i  is  independent  of 
Ilj  for  j  >  i  because  of  DPC.  We  can  formulate  the  following  problem: 


KN 


max 

A/,  A, 

Rsum,d  —  ^  ^  Id,i 
i 

(19) 

S.t. 

tr{Uf,}  <  Px 

(20) 

tr{F{UUxn^  +  I)F^}  <  Pf 

(21) 

A  joint  gradient  search  of  F,  Hi,  •  •  • ,  YIk  can  be  performed  directly  to  maximize  the  following  penalized 
function  of  (19): 


KN 

L2(F,  Ai,  •  •  •  ,  A^)  =  V  Id,i  +  -  log  (P,  -  tr{n  J)  +  -  log  {Pf  -  fr{F(Hn^H^  +  I)F^})  (22) 
^  tl  12 

i 

where  A*  is  such  that  flj  =  AjA^.  We  can  denote  all  parameters  in  F,  Ai,  •  •  •  ,  Kk  by  a  single  vector 
p,  and  the  gradient  of  L2  with  respect  to  p  by  VL2(p).  Similar  to  the  case  of  (14),  there  are  two  loops 
in  the  search.  The  inner  loop  is  for  a  fixed  pair  of  (fi,  f2)  where  the  Armijo  gradient  search  is  conducted 
until  the  norm  of  VL2(p)  is  small  enough.  The  outer  loop  corresponds  to  the  increase  of  (fi,f2)  until 
they  are  large  enough. 

To  show  an  explicit  expression  of  VL2(p)^  it  suffices  to  derive  explicit  expressions  of  and 
as  follows.  Following  the  rules  of  matrix  differentials  [18],  we  can  show 


dL2 

~W 

dL2 

dAi 


KN 

E 

i 

KN 

E 


F(Hn^H^  + 1) 


dld,i  _ 

dF  t2  Pf  -  tr{F(Hn^H^  +  I)F^} 


dld,t 

dA-i 


H^F^FHA, 


h  Px  -  tr{U^}  t2  Pf  -  fr{F(Hn,H^t  +  I)F^^} 


(23) 


(24) 


where  the  derivative  of  L2  with  respect  to  the  complex  matrix  F  is  defined  as  = 


aite{F} 


+ 


j  di^{F}  ’  same  applies  to  To  derive  and  we  first  define  Xj  and  Yj  according 


ai 


dh. 


to  (18)  such  that  Idf  =  log2  .  Then,  using  c)log|X|  =  fr{X  ^OX}  [18],  we  have  didf  = 
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(log2e)ir{X^  It  is  easy  to  derive  the  differentials  of  Xj  and  Yj  with  respect  to 

the  matrix  F.  Applying  the  resulting  expressions  into  the  above  expression,  it  follows  that 

dld,i  =  2(log2  e)Re  {tr  {Hf  -  Hf  Y-^N^aF^})  (25) 

where  M*  =  H^FH  (^$=1  +  H^F  and  N*  =  H^FH  (^5=1  +  H^F,  and  therefore 

^  =  2(log2  e)  (Hf  X-^M,  -  Hf  Y-^N,)  (26) 


Similarly,  one  can  verify  that  for  j  <  i  —  1, 
did, 


dAi 


=  2(log2  e)  (H^F^Hf  (X’^  -  Yr^)  H,FHA,) 


(27) 

and  for  j  =  i,  ^  =  2(log2e)  (H^F^Hf  X“^HiFHAj),  and  for  j  >  I  ^  =  0.  The  above 


algorithm  is  referred  to  as  Algorithm  3. 


C.  Minimization  of  Power  under  Rate  Constraint  (Algorithms  4-5) 

We  now  address  minimization  of  power  under  rate  constraint.  The  total  power  consumed  by  the  source 
and  the  relay  is 

Pd  =  fr{nj  +  fr{F(Hn,H^+l)F^}  (28) 

Our  problem  now  is  to  minimize  the  total  power  consumption  subject  to  rate  constraints: 

min  Pd  (29) 

F.iii,  .riic 

subject  to  Id,i  >  Vi  G  {1, 2,  •  •  •  ,  K}  (30) 

where  Ri  is  a  desired  data  rate  for  user  i  in  bits/s/Hz.  To  solve  this  problem,  we  can  search  for  the 
optimal  relay  matrix  F  and  the  optimal  source  covariance  matrices  Hi,  •  •  •  ,  11;^  in  an  alternate  fashion, 
where  each  cycle  of  the  alternation  is  shown  below. 

1 )  Source  optimization  with  fixed  F.-  We  now  assume  a  fixed  F  and  present  an  algorithm  for  computing 
the  optimal  Hi,  •  •  •  ,  Hr-  We  will  use  the  property  that  Id,i  is  independent  of  Ilj+i,  •  •  •  ,  Hr  and  is  a 
concave  function  of  Ilj,  and  P  is  a  linear  function  of  Hi,  •  •  •  ,  Ili^.  It  follows  from  (28)  that 

K 

Pd  =  Y.tr{Q,}  +  tr{FF^}  (31) 

i=l 


January  23,  2010 


DRAFT 


10 


where  Qi  =  (I  +  H^F^FH)^/2ni(I  +  H^F^FH)V2,  and  we  have  applied  tr(AB) 
Clearly,  Q,  and  Ilj  are  one-to-one  mappings  of  each  other.  We  now  define 


Gi  =  H*FH  (I  -f  H^F^FH 


.-HI2 


i—1 


Si  =  Gf  Gi  QjGf  +  HiFF^Hf  +  I  G* 


i=i 


fr(BA). 

(32) 

(33) 


where  Sj  depends  on  Qi,  ■  •  •  ,  Qi_i  hut  not  any  of  Qj,  •  •  •  ,  Qx-  Then,  it  follows  from  (18)  that 


Id,i  —  log2  |SjQi  -f  I|  (34) 

where  we  have  applied  log  |  AB  -f  I|  =  log  |BA  -|-  I|  with  AB  being  conjugate  symmetric. 

Based  on  (31)  and  (34),  the  optimal  solution  to  the  problem  (29)  for  Qj,  conditional  upon  F,  Qi,  •  •  •  ,  Qi_i, 
is  given  by  the  standard  water  filling  solution.  Namely,  if  fhe  eigenvalue  decomposifion  of  S*  is  denofed 
by  Sj  =  Xi^iUi^iuf^i  where  A*,/  >  0,  then  the  optimal  choice  of  Q*  is  Q*  = 

where  (x)"*"  =  max(x,0)  and  Vi  is  such  that  Id^i  =  Ri-  (Note:  In  order  to  keep  the  solution  inside  the 
interior  feasible  region  to  ensure  a  good  convergence  behavior,  we  should  choose  j  slightly  larger  than 
Ri.)  Furthermore,  with  a  fixed  F,  fhe  optimal  solution  for  Qi,  •  •  •  ,  (and  hence  Hi,  •  •  •  ,  Ili^)  can 
be  obtained  one  at  a  time  sequentially  by  starting  with  Hi. 

2)  Relay  optimization  with  fixed  Hi,  •  •  •  ,  TIk-  We  now  assume  that  Hi,  •  •  •  ,  YIk  are  fixed.  To  find 
fhe  opfimal  F,  we  can  use  fhe  gradienf  mefhod  fo  minimize  fhe  following  penalized  cosf  of  (29): 

L3  =  Pd-jY  -  Pi)  (35) 

i 

where  the  second  term  is  the  barrier,  and  both  and  j  are  functions  of  F.  With  the  gradient  ,  also 
denoted  by  VL3(F),  the  Armijo  search  algorithm  for  the  optimal  F  is  F(^+^)  =  /3™VL3(F^)  where 

m  is  the  smallest  integer  such  that  L3(F(^^)  —  L3  >  (T/3'”||VL3(F^)|P  and  R^i  —  Ri  > 

0,  Vf  G  {1, . . . ,  A}  where  0  <  /?  <  1  and  0  <  u  <  1.  Note  that  the  second  condition  I^^i  —Ri> 

0  is  important  to  ensure  that  none  of  the  rate  constraints  is  violated.  In  fact,  for  good  convergence  behavior, 
for  both  the  source  optimization  and  the  relay  optimization,  we  need  to  keep  F,ni,---  strictly 

inside  the  interior  feasible  region  of  the  problem. 

The  above  algorithm  for  power  minimization  is  referred  to  as  Algorithm  4.  Alternatively,  we  can  solve 
the  problem  (29)  by  a  joint  gradient  search  similar  to  Algorithm  3,  which  will  be  referred  to  as  Algorithm 
5. 
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IV.  Multiuser  MIMO  Uplink  Relay 

A  multiuser  MIMO  uplink  relay  system  is  illustrated  in  Fig.  2,  where  we  denote  by  G  the 

channel  matrix  from  user  i  to  the  relay,  and  by  ^  qMxM  channel  matrix  from  the  relay  to  the 

access  point.  Then,  we  write  the  received  signal  at  relay  as 

K 

Yr  =  ^  Hf  Xj  +  (36) 

i=l 

where  Xj  is  the  signal  transmitted  from  user  i,  and  is  the  white  Gaussian  noise  at  the  relay.  The  signal 
transmitted  from  the  relay  is 

r  =  (37) 


where  is  the  relay  matrix.  The  signal  received  at  the  access  point  is 

Yu  =  H^r  +  Uu 
K 

=  H^F^  ^  Hf  Xi  +  H^F^n^  +  (38) 

i=l 

where  n„  is  the  white  Gaussian  noise  at  the  access  point.  We  assume  the  use  of  successive  interference 
cancellation  (SIC)  at  the  access  point,  starting  from  user  K.  This  means  that  the  interference  from  user 
i  to  user  k  for  i  >  k  is  virtually  absent,  and  hence  the  achievable  data  rate  for  user  k  is 


^u,k 


H^F^ 

FH  +  H^F^FH  +  I 

H^F^ 

FH  +  H^F^FH  +  I 

(39) 


where  Ilj  =  F{xjX^}. 


A.  Maximization  of  Sum  Rate  under  Power  Constraint  (Algorithms  6-7) 

The  problem  of  maximizing  the  sum  rate  from  all  users  under  power  constraints  is  formulated  as 


follows: 


K 


max 

F.ni,--  ,yIk 
s.t. 


R. 


=  log2 


[J2t=i  Hf  n* Hi )  FH  +  H"  F"  FH  +  I 


ttH 


i=l 


|H^F^FH  +  I| 


tr{Ui}  <  Pi,yi  G  {1,2,---  ,K} 
tr  |f^  Hf  n,Hi  +  f|  <  F 


(40) 

(41) 

(42) 


Note  that  the  sum  rate  of  the  uplink  case  is  independent  of  the  order  of  SIC,  which  is  unlike  the  sum 
rate  of  the  downlink  case  with  DPC.  To  solve  this  problem,  we  can  optimize  each  of  F,  Hi,  •  •  •  ,  YIk  in 
a  cyclic  fashion.  The  basic  components  in  each  cycle  are  shown  below. 
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1)  Source  optimization  with  fixed  relay  and  other  sources:  If  all  F,  Hi,  •  •  •  ,  but  Ilj,  are  fixed, 
we  can  define  c  =  log2  |H^F^FH  + 1|  and 

G*  =  H^F^  I  UfUjUj  I  FH  +  H^F^FH  +  I 

which  are  independent  of  11^.  Then,  we  can  write 


Rsum,u  =  logajGi  +  H^F^HfniHiFHl -c 


=  log2 


I  +  G“^/^H^F^Hf  HiHiFHG”^/^ 


+  log2  |Gi|  -  c 


(43) 


The  power  constraint  (42)  is  equivalent  to 


tr  {F^Hf  HiH^F}  <  Pf  -  tr 


+  I  F 


It  should  be  clear  now  that  with  respect  to  11^  alone,  the  problem  (40)  is  equivalent  to  the  convex  problem 
(1)  which  is  solvable  by  the  GWF  algorithm. 

2)  Relay  optimization  with  fixed  sources:  If  Hi,  •  •  •  ,  YIk  are  fixed,  then  the  problem  (40)  with  respect 
to  F  alone  is  similar  to  a  problem  solved  in  [3],  the  solution  of  which  is  stated  below.  Define  the  SVD 
of  H  as  H  =  where  =  diag{ai,  ■  ■  ■  ,  aM)  with  descending  diagonal  order,  and  the  EVD 

of  R  =  as  R  =  where  =  diag{Xi,---  ,  Xm)  with  descending  diagonal 

order.  Then,  the  optimal  structure  of  F  is  given  by 


F  =  Er'EfU 


(44) 


where  =  diag{fi, . . . ,  >  0  which  are  to  be  determined.  With  (44),  the  problem  (40)  becomes 


M 


max 
/i>'"  Jm 


R. 


sum,u 


2  =  1 


(45) 


M 


s.f.  +  l)fi  <  Pf  and  /i  >  0  Vi 


2=1 


Then,  by  the  KKT  method  [15],  we  have 

1 


h  = 


2cj-  (1  +  Aj)  _ 


Y^A?  +  4Aj(T?  p  —  Xi  —  2 


(46) 


where  p  is  such  that 


"  1  r  / - 

V  Af  +  AXiaf  p-  Xi- 2 


i  2^2 
2=1  * 


=  Pf- 


The  above  algorithm  that  searches  for  F,  Hi,  •  •  •  ,  in  a  cyclic  fashion  is  referred  to  as  Algorithm 
6.  Note  that  each  component  in  Algorithm  6  is  a  convex  optimization.  Alternatively,  we  can  solve  the 
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problem  (40)  by  a  joint  gradient  search  over  F,  Hi,  •  •  •  ,  simultaneously,  which  will  be  referred  to 
as  Algorithm  7.  The  details  of  Algorithm  7  are  omitted  because  of  its  similarity  to  other  joint  gradient 
search  algorithms. 


B.  Minimization  of  Power  under  Rate  Constraint  (Algorithms  8-9) 
The  total  power  consumption  for  the  uplink  case  is: 


=  +  fr  I ^ nfUiUi  + F I 


(47) 


With  the  assumption  of  SIC,  the  individual  rate  Iu,i  for  user  i  is  given  by  (39).  Hence,  the  problem  is 
formulated  as: 


min  Pu 
F,ni,  ,yIk 

s.t.  Iu,i>Ri,  yie  {1,2,...K} 


(48) 

(49) 


The  problem  (48)  can  be  solved  by  a  joint  gradient  search  algorithm  (Algorithm  9)  which  is  omitted,  or 
an  alternate  optimization  algorithm  (Algorithm  8)  as  shown  below. 

1)  Source  optimization  with  fixed  relay:  Since  the  order  of  the  SIC  is  from  K  to  Iu,i  is  independent 
of  Ili+i, . . . ,  Ci-K,  which  is  a  property  also  shared  in  the  downlink  case.  With  hxed  F,  Hi,  •  •  •  ,  Ilj-i, 
the  optimal  11  j  can  be  found  by  a  convex  optimization  same  as  in  section  III-Cl. 

2)  Relay  optimization  with  fixed  sources:  Given  IIi,---  ,11^^,  the  optimal  F  can  be  found  by  the 
following  gradient  method.  Dehne  the  following  cost  with  a  barrier: 


L4  =  fr  I F^  nfUiUi  +  F I  -  ^  ^  log{Iu,i  -  Ri) 


It  follows  that 


K 


aF 


t  ^  luA  -  Ri  d¥ 


(50) 


(51) 


To  derive  we  first  rewrite  (39)  as  =  log2  .  Similar  to  the  derivation  of  (26),  it  can  be 

shown  that 


dlu,i 

dP 


=  2(log2e)  (QWriH^  -  Q_iWr\H^) 


(52) 


where  Cj  =  ( I  +  The  rest  of  the  algorithm  is  the  same  as  in  section  III-C2. 
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V.  Multi-Carrier  Extensions 


In  the  previous  sections,  we  have  assumed  that  there  is  a  single  carrier  for  power  allocation.  If  one 
wants  to  use  Me  (orthogonal)  carriers  for  joint  power  allocation,  the  previously  shown  algorithms  are 
also  applicahle  after  the  following  changes  of  notations  are  adopted. 

For  the  downlink  case,  the  signal  models  shown  in  (4)-(6)  hold  except  that 


X 

(53) 

y 

=  [y(ifu-- 

,y{Meff  G 

(54) 

n 

= 

,n{Mcff  G 

(55) 

H 

=  diag[ll{l), 

•••  ,H(Mc)] 

(56) 

r 

= 

(57) 

yi 

=  [y.(if,--- 

,y^{Me)T 

(58) 

n* 

= 

(59) 

=  diag[ili{l), 

,•••  ,Hi(Mc)] 

(60) 

;re  for  example  x(m)  denotes  the  signal  transmitted  from  the  access  point  on 

and  F  G  qMM^xMM^ 

the  mth  carrier.  Note  that  the  optimal  F  is  not  necessarily  block  diagonal.  In  other  words,  the  relay  may 
use  a  different  carrier  to  forward  a  stream  of  data  that  was  received  hy  the  relay  on  another  carrier  [10]. 
Good  (if  not  globally  optimal)  choices  of  F  along  with  the  source  covariance  matrices  at  all  carriers  can 
be  determined  by  any  of  the  power  allocation  algorithms.  For  the  uplink  case,  the  signal  models  shown 
in  (36)-(38)  also  hold  after  a  similar  change  of  definitions  of  the  notations. 

These  notational  changes  do  not  affect  any  of  the  algorithms  shown  in  this  paper  as  long  as  the  power 
constraint  is  for  the  sum  power  over  all  carriers  and  the  rate  of  interest  is  also  the  sum  rate  over  all 
carriers.  However,  the  complexity  of  these  algorithms  will  increase  because  of  the  increased  dimensions. 


VI.  Simulation  Results 

For  convenience  of  reference,  all  algorithms  presented  in  Sections  III  and  IV  are  summarized  in  Table 
I.  For  the  simulation  examples  shown  below,  a  sample  set  of  computational  times  of  all  algorithms  for  a 
random  channel  realization  and  a  random  initialization  are  listed  in  the  last  line  in  Table  I.  All  algorithms 
have  roughly  the  same  speed  except  Algorithm  1  which  uses  CVX  and  is  much  slower  than  others  for 
a  single  run.  Algorithm  1  uses  geometric  programming  as  proposed  in  [7],  for  which  the  GWF  is  not 
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applicable.  However,  unlike  other  algorithms,  Algorithm  1  is  globally  convergent  and  needs  no  multiple 
runs  associated  with  multiple  initializations.  When  multiple  runs  are  considered  for  other  algorithms,  they 
may  become  effectively  slower  than  Algorithm  1 .  However,  one  can  use  the  result  from  Algorithm  1  (for 
down  link  only)  as  an  initialization  for  Algorithm  2  for  a  new  research,  which  will  be  further  discussed 
later. 

Next,  we  show  simulation  examples  to  compare  these  algorithms.  We  assume  that  there  are  two  users 
K  =  2,  each  user  is  equipped  with  two  antennas  N  =  2,  the  relay  and  the  access  point  are  both 
equipped  with  four  antennas  M  =  4.  A  single  carrier  is  assumed.  Each  of  the  channel  parameters  is 
realized  independently  using  a  complex  Gaussian  distribution  with  zero  mean  and  unit  variance.  As 
assumed  throughout  this  paper,  every  entry  of  the  noise  vectors  has  zero  mean  and  unit  variance.  The 
performance  in  terms  of  either  the  sum  rate  or  the  total  power  is  based  on  an  average  over  50  channel 
realizations.  Our  experience  with  100  or  more  channel  realizations  did  not  lead  to  any  significant  change 
of  results.  Unless  mentioned  otherwise,  the  search  conducted  by  each  algorithm  (except  Algorithm  1 
which  is  globally  convergent)  was  initialized  randomly,  20  random  initializations  were  chosen  for  each 
realization  of  channel  matrices,  and  the  best  result  from  the  20  initializations  were  selected  for  computing 
the  performance.  We  have  found  that  the  performance  difference  between  the  “best”  and  “worst”  from  20 
initializations  can  be  up  to  20%.  In  general,  the  more  initializations  are  used,  the  better  is  the  chance  the 
optimal  solution  is  found.  But  the  computational  cost  increases  as  the  number  of  initialization  increases. 

Figure  3  compares  the  averaged  sum  rates  achieved  by  the  downlink  Algorithms  1-3  versus  the  relay 
power  Pf.  The  power  at  the  source  is  fixed  at  Pj;  =  1.  Algorithm  1  is  based  on  the  geometric  programming 
proposed  in  [7].  Both  Algorithms  1-2  are  based  on  ZFDPC  while  Algorithm  3  is  based  on  DPC.  For 
Algorithm  2,  there  are  two  curves  in  this  figure.  For  the  lower  curve,  we  used  the  results  from  Algorithm  1 
as  initializations  for  Algorithm  2.  For  the  upper  curve,  we  used  random  initializations.  We  see  that  except 
for  the  region  of  small  relay  power.  Algorithm  1  yielded  the  least  sum  rate  among  the  three  algorithms 
while  Algorithm  3  yielded  the  largest  sum  rate.  In  theory.  Algorithm  3  should  yield  the  largest  sum 
rate  for  the  entire  region  of  relay  power  if  a  global  optimum  (including  the  optimal  ordering  of  the 
DPC)  is  achieved.  This  figure  suggesfs  fhaf  in  the  small  relay  power  region.  Algorithm  3  was  trapped 
in  unfavorable  local  minima.  Since  ZFDPC  and  DPC  are  different  coding  schemes,  the  results  from 
Algorithms  1-2  cannot  unfortunately  be  used  as  good  initializations  for  Algorithm  3.  The  complexity  of 
DPC  is  much  more  complex  than  that  of  ZFDPC. 

Figure  4  compares  the  averaged  total  power  consumption  required  by  the  downlink  Algorithms  4-5 
versus  individual  rate  constraint.  Also  shown  in  this  figure  is  the  power  consumption  based  on  the  identity 
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relay  matrix,  i.e.,  F  =  I,  while  the  source  covariance  matrix  is  optimized  hy  the  source  optimization 
subroutine  in  Algorithm  4.  Algorithm  4  uses  cyclic  search  while  Algorithm  5  uses  joint  gradient  search. 
The  search  directions  for  cyclic  search  are  more  limited  than  the  joint  gradient  search.  We  see  that  when 
the  date  rate  is  high,  the  difference  of  power  consumption  is  very  large.  The  power  consumption  from 
Algorithm  5  is  the  least,  i.e.,  the  best. 

Figure  5  compares  the  averaged  sum  rates  achieved  by  the  uplink  Algorithms  6-7  versus  the  power 
constraint  at  the  relay.  The  source  power  is  fixed  at  P*  =  1  for  all  i.  It  turns  out  that  the  two  algorithms 
yield  the  same  results.  The  relay  optimization  and  the  source  optimization  in  Algorithm  6  (which  is 
cyclic)  are  both  convex,  and  Algorithm  7  uses  the  joint  gradient  search.  The  lower  curve  in  this  figure 
is  based  on  fhe  identity  relay  matrix,  i.e.,  F  =  I,  while  the  source  covariance  matrices  of  all  users  are 
optimized  by  the  source  optimization  subroutine  in  Algorithm  6. 

Figure  6  compares  the  averaged  total  power  consumption  required  by  the  uplink  Algorithms  8-9  versus 
a  common  data  rate  of  all  users.  Also  shown  in  this  figure  is  a  curve  based  on  the  identify  relay  matrix, 
i.e.,  F  =  I,  while  the  source  covariance  matrices  of  all  users  are  optimized  by  the  source  optimization 
subroutine  in  Algorithm  8.  In  this  case,  the  joint  gradient  search  by  Algorithm  9  yields  better  results  than 
the  cyclic  search  by  Algorithm  8. 

Finally,  Figure  7  illustrates  an  effect  of  joint  multi-carrier  power  allocation.  Here,  the  relay  system  is 
for  downlink,  there  are  two  users  {K  =  2),  each  user  has  two  antennas  {N  =  2),  there  are  four  antennas 
at  the  relay  node  and  four  antennas  at  the  access  point  (M  =  4),  and  there  are  two  carriers  (Me  =  2). 
For  each  of  the  two  carriers,  an  independent  channel  realization  was  made.  The  first  top  curve  is  the  sum 
rate  over  two  users  and  two  carriers,  which  was  obtained  by  the  joint  multi-carrier  power  allocation.  The 
second  top  curve  is  the  sum  rate  over  two  users  and  two  carriers,  which  was  obtained  by  two  separate 
single-carrier  power  allocations.  The  bottom  two  curves  are  the  sum  rates  each  summed  over  the  two 
users  for  carrier  1  and  carrier  2,  respectively.  The  total  power  for  the  two  carriers  used  for  the  first  curve 
is  twice  that  for  each  carrier  used  for  the  other  curves.  The  power  per  carrier  is  the  same  for  all  curves. 
We  see  that  there  is  an  improvement  of  the  sum  rate  by  using  joint  multi-carrier  power  allocation,  which 
is  expected.  However,  the  improvement  is  not  large.  It  is  known  that  the  distribution  of  the  singular 
values  of  a  matrix  of  i.i.d  random  variables  hardens  (becomes  invariant)  as  the  dimension  of  the  matrix 
increases  [19].  Hence,  if  the  number  of  antennas  at  each  node  becomes  large,  the  improvement  from  the 
joint  multi-carrier  power  allocation  is  expected  to  disappear. 
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VII.  Conclusion 

In  this  paper,  we  have  developed  several  computational  strategies  for  a  multiuser  MIMO  relay  system 
where  each  node  may  he  equipped  with  multiple  antennas.  The  complexities  of  these  algorithms  are 
about  the  same,  hut  their  performances  can  he  very  much  different.  Although  the  central  problem  is 
non-convex,  the  joint  gradient  search  for  the  relay  matrix  and  the  source  covariance  matrices,  with 
multiple  random  initializations,  has  consistently  yielded  the  best  result.  The  use  of  logarithmic  barrier 
functions,  which  is  a  key  approach  of  the  interior-point  optimization  methods,  has  been  very  effective 
for  constrained  optimizations.  But  for  one  case,  the  cyclic  (or  alternating)  search  for  the  relay  matrix  and 
the  source  covariance  matrices  yielded  results  similar  to  those  by  the  joint  gradient  search.  The  GWF 
algorithm  shown  in  this  paper  is  a  faster  alternative  to  the  CVX  algorithm  (or  package)  to  solve  the 
convex  problem  (1).  In  applications  with  practical  coding  methods,  the  rate-versus-power  model  of  each 
link  may  need  to  be  revised  with  simple  penalty  factors  while  the  power  allocation  algorithms  shown  in 
this  paper  are  still  applicable.  This  paper  has  shown  that  fast  algorithms  for  power  allocation  are  very 
important  to  achieve  the  full  potentials  of  MIMO  relay  systems  with  multiple-antenna  users. 


Appendix  A 
Proof  of  theorem  1 


For  any  Q  >  0  (i.e.,  positive  semi-definite),  we  can  write  Q  =  AA^  where  A  is  a  full  column  rank 
matrix.  With  respect  to  A  ,  we  can  write  the  following  Lagrangian  function  of  (1): 


L  =  -  log  |  /  +  HAA^H^I  +  {B,,AA^Bf }  -  Pi) 


(61) 


2=1 


The  gradient  of  L  with  respect  to  A  can  be  found  by  using  c)log|X|  =  tr(X  ^OX),  c)(XX^)  = 
(c)X)X^  -I-  and  other  basic  tools  [18].  The  result  is 


dL 


dL 


dL  ^ 

dAH  ~  dRe{AY  dIm{A)T 


-2A^  (  (I  +  HAA^H^)  ^  H  -  ^  fr^Bf  B 


(62) 


2=1 


Then,  the  complete  K.K.T.  conditions  [15]  of  the  problem  (1)  with  respect  to  A  can  be  written  as 


m 


- A^  (I  +  HAA^H^)  ^  H  -  ^  =  0 

(63) 

tr  {Bi A A^Bf}  -  Pi  <  0 

(64) 

Li>0 

(65) 

fii  {tr  {BjAA^Bf }  -  Pi)  =0 

(66) 
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where  i  =  1,  •  •  •  ,m. 

Although  the  prohlem  (1)  with  respect  to  A  is  not  convex,  we  now  show  that  the  generalized  KKT 
conditions  [15]  of  the  prohlem  (1)  with  respect  to  Q  >  0,  which  is  convex,  are  equivalent  to  (63)-(66). 
Consider  L  as  in  (61)  with  AA^  replaced  hy  Q.  It  follows  that 

FIT  1  ™ 

—  =  -H^  (I  +  HQH^)  n  +  Bi  (67) 

i=i 

We  define  a  vector  operator  for  a  complex  conjugate  symmetric  matrix  as  follows: 


uec(Q)  = 


vec(Re{Q}) 
vec{Im{Q}) 

Here,  uec(i?e{Q})  stacks  up  all  elements  from  72e{Q},  and  uec(/m{Q})  stacks  up  all  elements  from 


Im{Q}.  Assume  Q  G  Qnxn_  Then,  uec(Q)  G  Now,  based  on  (5.95)  in  [15],  we  have  the 


following  sufficient  generalized  KKT  conditions: 


uec  (I  +  HQH^)  ^  H  +  ^ j  -  cj  =  0  (68) 

tr  {BjQBf  }  -  Pi  <  0  (69) 

/ii  >  0  (70) 

/ri(p{BiQBf}-Pi)  =0  (71) 

a;^uec(Q)  =  0  (72) 


where  i  =  1,  •  •  •  ,m,  u  €  Also,  Q  G  /C  =  {Q'  |  Q'  >  0},  and  u:  is  in  the  dual  cone  of  JC,  i.e., 

cj  G  JC^  =  {ijL}\ui^ vec{(^')  >  0  VQ'>  0}.  The  term  —u  in  (68)  is  due  to  the  constraint  —  Q  <  0,  for 
which  we  have  used  =  I- 

Note  that  for  any  two  complex  conjugate  symmetric  and  positive  semi-definite  matrices  A'  and 
B',  the  following  equations  are  equivalent:  A'^B'  =  0  fr(A'^B')  =  0  Pe{A'}^Pe{B'}  + 

/m{A'}^Im{B'}  =  0  uec(A')^uec(B')  =  0.  It  is  then  easy  to  show,  similar  to  Example  2.24  in 
[15],  that  1C  =  JC^ .  Then,  as  long  as  >  0  and  Q  =  AA^,  we  have  that  (68)  implies  u  G  /C^,  (63) 
implies  (72)  and  vice  versa.  On  the  other  hand,  if  >  0  does  not  hold,  then  u  G  JC^  does  not  hold 
because  of  (68).  Therefore,  if  and  only  if  >  0,  (63)-(66)  are  equivalent  to  (68)-(72). 

Next,  we  construct  an  optimal  structure  of  Q  based  on  (63).  Since  KK^  =  ^  ™  ^  and  K  is 

non-singular,  (63)  is  equivalent  to 

- A^K  (I  -f  HK-^K^ AA^KK"^H^)  HK'^  -  l)  =  0  (73) 
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Define  the  SVD  of  HK  ^  as 

HK"-^  =  =  U 


Vi  V2 


H 


(74) 


where  U  and  V  are  square  unitary  matrices,  X)i  (square)  and  X)2  (possibly  non-square)  are  diagonal,  all 
the  diagonal  elements  of  X)i  are  larger  than  one,  and  all  the  diagonal  elements  of  X)2  are  less  than  or 
equal  to  one.  We  now  assume  that  K^A  =  ViT  where  T  is  non-singular.  Then,  (73)  is  equivalent  to 
the  following: 


-A^K  1  K-^H^  (I  -f  HK-^K^AA^KK^^H 


HK--^  -  I )  =  0 


-T^Vf  (I  -f  USV^ViTT^Vf  VS'^U^)  ^  -  l)  =  0 

/ 


^  -T 


H 


44  -T 


H 


s:i  0 


Si  0 


1  + 


TT 


H 


Si  0 


51-  I  0 


=  0 


1  ((TT^)-I  +  5]2)  U  0 


51-  I  0 


=  0 


-T^(((s2  0  )  -  51?  ((TT^)-i  +  S?)  o))-(l  o))v^  =  0 

S?  -  S?  (^(TT^)“^ -f  S?)  ^S?-I  =  0 

s?  -  s?  (sr^  -  sr'  (TT^  +  sr')-'  sr^)  s?  - 1  =  o 

TT^  =  I  -  S5“2 


(75) 


where  for  (c)  and  (/)  we  used  the  matrix  inverse  lemma.  We  see  that  since  TT^  =  I  —  S^^  >  0,  the 
above  solution  for  T,  and  hence  the  corresponding  A,  is  a  valid  solution. 

The  above  solution  of  K^A  has  the  same  span  as  Vi.  A  simple  observation  of  the  above  analysis 
also  suggests  that  as  long  as  the  span  of  K^A  belongs  to  that  of  Vi,  a  matrix  T  exists  such  that 
K^A  =  V?T  satisfies  (73)  where  V?  is  a  sub-matrix  (selected  columns)  of  Vi.  On  the  other  hand,  if 
the  span  of  K^A  contains  a  vector  from  V2,  i.e.,  K^A  =  V^T  where  V2  has  a  column  vector  from 
V2,  then  there  does  not  exist  such  a  matrix  T  for  A  to  satisfy  (73),  or  equivalently  the  corresponding 
“solution”  TT^  would  be  non-positive  semi-definite  which  contradicts  to  the  fundamental  nature  of 
TT^.  Therefore,  the  highest  rank  solution  of  A  to  satisfy  (73)  is  given  by  A  =  K'^ViT  where 
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T  =  (l  —  Equivalently,  the  highest  rank  solution  of  Q  to  satisfy  (73)  is  given  hy 

Q  =  AA^  =  K~'^ViTT^Vf  K”i 
=  K-^Vi(I  -  S:5“2)vf  K"! 

=  K"^V(I  -  (76) 


where  5]“^  =  (5]^X))~^,  the  inverse  of  a  zero  (squared  singular  value)  would  he  treated  as  positive 
infinity,  and  (I  —  X)~^)+  applies  (x)"*"  =  max(x,0)  on  eaeh  diagonal  element  of  itself. 

With  (76)  and  (67),  one  can  verify  that 


dL 


^  =  KV(l-S^(l  +  S(l-E-")-"s 


-1 


>  0 


(77) 


Note  that  the  fth  diagonal  element  of  the  diagonal  matrix  between  V  and  in  (77),  denoted  hy  di,  is 

d^  = 

(78) 


l-af>0  if  af  <  1 


0 


if  at  >  1 


where  ai  is  the  ith  diagonal  element  of  X).  If  we  did  not  use  the  highest  rank  solution  for  Q  as  in  (76), 
then  there  would  he  a  dj  =  1  —  fi?  <  0  associated  with  a  af  >  1  and  hence  (77)  would  not  hold  and 
hence  the  corresponding  u  from  (68)  would  not  belong  to  JC^. 

With  the  optimal  Q  given  in  (76),  which  is  a  function  of  /r  =  [/ri,  •  •  •  the  remaining  problem 

is  to  find  fhe  optimal  /r.  Since  the  effective  KKT  equations  for  /r  are  the  same  for  both  (63)-(66)  and 
(68)-(72),  the  optimal  can  be  found  by  using  either  the  dual  problem  of  (1)  with  respect  to  A  or  the 
dual  problem  of  (1)  with  respect  to  Q.  Choosing  the  former,  we  can  find  fhe  optimal  /r  by  solving  (3). 
The  dual  problem  of  (1)  with  respect  to  Q  is  the  same  as  (3)  except  for  the  additional  term  —vec^{Q)u! 
which  is  however  maximized  to  zero  by  u  for  any  /r. 

The  proof  of  the  theorem  is  completed.  In  the  next  section,  we  show  how  to  find  the  optimal  fi  in 
more  details.  For  the  primal  problem  (1),  Q  has  real  elements.  (Even  under  the  constraint  Q  =  Q^, 
Q  has  free  real-part  elements,  free  imaginary-part  elements,  and  hence  total  free  real 

elements.)  For  the  dual  problem  (3),  there  are  m  real  variables  in  /x.  If  m  <  n^,  it  is  reasonable  to  expect 
the  dual  problem  to  be  less  costly  to  solve. 
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Appendix  B 

Computation  of  the  Dual  Problem  in  Theorem  1 

Since  the  dual  problem  is  convex,  we  can  follow  the  interior-point  method  [15]  and  define  the  following 
dual  function  with  logarithmic  barrier  terms: 

m  . 

D(/r)  =  -  log  |I  +  HQ(/r)H^|  +  -  P,)  +  ^  log  (79) 

1  ^ 

2=1  2 

where  we  use  Q(/t)  to  stress  that  Q  is  a  function  of  /r.  Note  that  the  first  two  terms  in  (79)  equal 
to  minQ>o  L,  which  we  want  to  maximize  subject  to  /x  >  0.  For  each  choice  of  t,  we  can  apply  the 
Newton’s  method  [15]  to  find  the  optimal  /x,  i.e., 

^(fc+i)  ^  ^(fc)  +  ^(^)(v2p)(/xW))-iVP(/xl^l)  (80) 

where  k  denotes  the  iteration  index  and  the  scalar  7!^!  is  determined  by  the  backtracking  line  search. 
Upon  convergence  for  each  t,  we  can  increase  f  by  a  factor  (5  >  1  and  continue  a  new  cycle  of  the 
Newton’s  search.  The  above  process  continues  until  1/f  is  smaller  than  a  pre-specified  number  e. 

The  computation  of  the  gradient  vector  Vi9(/xl^l)  and  the  Hessian  matrix  V^i9(/xl^l)  is  straightforward 
although  the  detailed  expressions  are  lengthy.  Since  Q(/x)  depends  on  the  eigenvalue  decomposition  of 
and  the  computation  of  K  =  1  also  needs  the  eigenvalue  decompo¬ 

sition  of  X]™  1  k-iBfBi,  we  need  to  use  the  first-order  and  second-order  differentials  of  eigenvalues  and 
eigenvectors.  The  basic  formulas  for  these  differentials  can  be  found  in  [18].  The  detailed  expressions  of 
the  gradient  and  the  Hessian  are  omitted  to  save  space. 

To  avoid  possible  numerical  problems  in  computing  the  differentials  of  eigenvectors  when  there  are 
multiple  identical  eigenvalues,  we  added  a  small  random  perturbation  matrix  to  Xll^i  F*Bf^Bj  in  our 
program,  which  proved  to  be  very  effective.  A  complete  Matlab  script  of  the  GWF  algorithm  is  available 
at  http://www.ee.ucr.edu/-yhua/GWF.pdf. 


Appendix  C 

A  Comparison  of  GWF  and  CVX 

To  show  a  comparison  of  our  GWF  algorithm  with  CVX  in  [13],  we  ran  both  algorithms  on  a  desktop 
with  2A0GHz  CPU.  We  chose  Pi  =  1,  P2  =  1.5,  Bi  =  I,  and  used  the  complex  Gaussian  distribution 
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with  zero  mean  and  unit  varianee  to  randomly  choose  each  element  in  the  following  matrices: 

/  -0.6705 +  0.3791i  0.1469  +  0.4499i  -0.2913  -  0.3867i  0.1568  -  0.0536i  ^ 

0.2398  -  0.3460i  -0.0702  -  1.0615i  -0.4482  +  0.0759i  -1.0125  +  0.5067i 

-0.8170 +  0.3401Z  -0.5652  +  0.1424i  0.1243  -  0.1684i  0.2645  -  0.2377z 

y  -0.7213  -  0.5363i  -0.1463  -  0.3667i  -0.7448  +  0.4854i  0.1717  +  0.0345i 

/  0.1993 +  0.1027i  -0.6859  +  0.4280i  0.1457  +  0.3800i  0.2031  +  0.5548i  ^ 


H 


Bo  = 


0.5582  +  0.2944i  -0.3429  -  0.4255i  0.5535  -  0.8565i  0.6080  -  0.5549i 
0.3102  -0.1320i  0.1658 +  0.4059Z  0.1225  +  0.7685i  0.7242  +  0.1927i 


Y  -0.1438 +  1.2477i  -0.4989  +  0.3501i  0.0825  -  0.8049i  -0.5126  +  0.4826i 

For  the  GWF  algorithm,  the  initial  elements  of  were  randomly  chosen  between  zero  and  10“^.  We 
chose  V D{fj,)'^ D{fx)  <  10“^  as  the  stopping  criterion  for  the  inner  loop  (for  fixed  t). 
We  also  chose  =  2  and  =  2t^^\  and  finally  2/t  <  10“^  as  the  stopping  criterion  for  the  outer 

loop.  We  noticed  that  for  each  t,  the  inner  loop  converged  after  about  8  iterations. 

At  the  convergence,  the  following  results  from  the  GWF  algorithm  and  the  CVX  algorithm  were 
obtained: 

\ 


Qgwf  = 


(  0.3726 

0.1804 +  0.0634Z 


0.0470  -  0.0795f  -0.1740  -  0.0078f 

-0.0779  -0.1381Z  -0.1265  -  0.1644f 


0.0893  +  0.0208f 
0.1909 


Q,cvx  = 


/  0.3726 

0.1804 +  0.0634i 


/ 

\ 


-0.0779  -  0.1382i  -0.1265  -  0.1644f 
0.0894  +  0.0208f 
0.1909 


0.1804  -  0.0634i 
0.2722 

0.0470  +  0.0795Z  -0.0779  +  0.1381z  0.1643 

y  -0.1740 +  0.0078i  -0.1265  +  0.1644z  0.0893  -  0.0208f 

0.1804  -  0.0634i  0.0469  -  0.0796f  -0.1739  -  0.0078f 

0.2722 

0.0469  +  0.0796Z  -0.0779  +  0.1382z  0.1643 

y  -0.1739 +  0.0078i  -0.1265  +  0.1644z  0.0894  -  0.0208f  0.1909  J 

These  two  matrices  agree  with  each  other  very  well.  Both  GWF  and  CVX  achieve  the  same  value  of 
capacity  2.6139  in  bits/s/Hz  (i.e.,  —  J  in  (1)).  But  GWF  took  3.40  seconds  while  CVX  took  14.94  seconds. 
GWF  is  about  four  times  faster  than  CVX.  Note  that  the  dimension  of  Q  used  here  is  larger  than  that 
used  for  Algorithms  2  and  6  shown  in  Table  I 

Figure  8  shows  how  /x  of  the  GWF  converged  to  the  optimal  as  the  outer  iterations  continued.  We 
see  that  yL2  approaches  to  zero,  which  means  that  the  second  power  constraint  is  satisfied  automatically 
while  the  first  power  constraint  is  active.  Figure  9  illustrates  the  capacity  (— J)  as  function  of  the  barrier 
constant  t. 
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TABLE  I 

Summary  of  power  allocation  algorithms  for  a  multiuser  MIMO  relay  system.  The  sample  run  times 
WERE  BASED  ON  A  DESKTOP  WITH  2.40GHZ  CPU,  TWO  USERS  EACH  WITH  TWO  ANTENNAS  AND  A  RELAY  WITH  FOUR 

ANTENNAS. 


Alg.  1 

Alg.  2 

Alg.  3 

Alg.  4 

Alg.  5 

Alg.  6 

Alg.  7 

Alg.  8 

Alg.  9 

Section  No. 

III-A 

III-A 

III-B 

III-C 

III-C 

IV-A 

IV-A 

IV-B 

IV-B 

Downlink 

/ 

/ 

/ 

/ 

/ 

Uplink 

/ 

/ 

/ 

/ 

Max  Rate 

/ 

/ 

/ 

/ 

/ 

Min  Power 

/ 

/ 

/ 

/ 

ZFDPC 

/ 

/ 

DPC 

/ 

/ 

/ 

SIC 

/ 

/ 

/ 

/ 

Cyclic  Search 

/ 

/ 

/ 

/ 

Joint  Search 

/ 

/ 

/ 

/ 

/ 

Use  of  GWF 

/ 

/ 

Sample  Run  Time  in  Sec 

17.10 

5.12 

4.38 

7.44 

6.32 

8.15 

6.91 

4.18 

3.92 

Fig.  1.  Diagram  of  a  multiuser  MIMO  relay  downlink  system. 


Hr 


Fig.  2.  Diagram  of  a  multiuser  MIMO  relay  uplink  system. 
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Fig.  3.  Comparison  of  downlink  Algorithms  1-3:  Averaged  sum  rate  versus  power  constraint  at  relay.  Alg.  2-A 
is  Algorithm  2  using  the  best  out  of  20  random  initializations.  Alg.  2-B  is  Algorithm  2  using  the  results  from 
Algorithm  1  as  initializations. 


Fig.  4.  Comparison  of  downlink  Algorithms  4-5:  Averaged  total  power  consumption  versus  individual  rate  constraint. 
The  curve  on  the  top  is  for  the  identity  relay  matrix. 
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Fig.  5.  Comparison  of  uplink  Algorithms  6-7:  Averaged  sum  rate  versus  relay  power  constraint.  The  curves  for 
Algorithms  6-7  are  identical.  The  lower  curve  is  for  the  identity  relay  matrix. 


Fig.  6.  Comparison  of  uplink  Algorithms  8-9:  Averaged  total  power  consumption  versus  individual  rate  constraint. 
The  curve  on  the  top  is  for  the  identity  relay  matrix. 
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Fig.  7.  An  example  of  joint  multi-carrier  power  allocation  for  downlink  multi-user  MIMO  relay  system  where 
K  =  2,  N  =  2,  M  =  A  and  Me  =  2.  Algorithm  3  was  applied  with  20  random  initializations.  The  rates  shown  are 
based  on  a  single  channel  realization  for  each  of  the  two  carriers. 


Fig.  8.  Optimal  values  of  pi  and  ^2  as  function  of  the  outer  loop  index  n\nt  =  2”. 


January  23,  2010 


DRAFT 


28 


4  6  8  10 

outer  iterations:  t=2'' 


Fig.  9.  Optimal  value  of  —  J  (capacity)  as  function  of  the  outer  loop  index  n  in  f  =  2". 


PLACE 

PHOTO 

HERE 


Yuan  Yu  received  B.E.  in  Automation  from  Nankai  University,  Tianjin,  China,  in  2002,  M.S.  in  Automatic 
Control  from  Institute  of  Automation,  Chinese  Academy  of  Sciences,  Beijing,  China,  in  2005  and  Ph.D. 
degree  in  electrical  engineering  from  University  of  California  Riverside,  Riverside,  CA,  in  2009. 

His  research  interests  include  cooperative  wireless  communications,  wireless  multi-hop  networks,  mul¬ 
tiuser  MIMO  communication  systems. 


January  23,  2010 


DRAFT 


29 


Yingbo  Hua  (S’86-M’88-SM’92-F’02)  received  B.S.  degree  in  Control  Engineering  from  Nanjing  Institute 
of  Technology  (the  predecessor  of  Southeast  University),  Nanjing,  China,  in  Feh.  1982,  and  M.S.  and  Ph.D. 
degrees  in  Electrical  Engineering  from  Syracuse  University,  Syracuse,  NY,  in  1983  and  1988,  respectively. 

Erom  1988  to  1989,  he  was  a  research  fellow  at  Syracuse,  consulting  for  Syracuse  Research  Co.,  NY, 
and  Aeritalia  Co.,  Italy.  He  was  Lecturer  from  Feh.  1990  to  1992,  Senior  Lecturer  from  1993  to  1995, 
and  Reader  and  Associate  Professor  from  1996  to  2001,  with  the  University  of  Melbourne,  Australia. 
He  served  as  a  visiting  professor  with  Hong  Kong  University  of  Science  and  Technology  from  1999  to  2000,  and  consulted 
for  Microsoft  Research  Co.,  WA,  summer  2000.  Since  Feb.  2001,  he  has  been  Professor  of  Electrical  Engineering  with  the 
University  of  California,  Riverside,  CA. 

He  is  an  author/coauthor  of  numerous  articles  in  journals,  conference  proceedings  and  books,  which  span  the  fields  of  sensor 
array  signal  processing,  channel  and  system  identification,  wireless  communications  and  networking,  and  distributed  computations 
in  sensor  networks.  He  is  a  co-editor  of  Signal  Processing  Advances  in  Wireless  and  Mobile  Communications,  Prentice-Hall, 
2001,  and  High-Resolution  and  Robust  Signal  Processing,  Marcel  Dekker,  2003.  He  has  served  on  the  Editorial  Boards  for  IEEE 
Transactions  on  Signal  Processing,  IEEE  Signal  Processing  Letters,  IEEE  Signal  Processing  Magazine,  and  Signal  Processing 
(EURASIP).  He  also  served  on  IEEE  SPS  Technical  Committees  for  Underwater  Acoustic  Signal  Processing,  Sensor  Array  and 
Multi-channel  Signal  Processing,  and  Signal  Processing  for  Communications  and  Networking,  and  on  numerous  international 
conference  organization  committees. 


PLACE 

PHOTO 

HERE 


January  23,  2010 


DRAFT 


